New! Sign up for our free email newsletter.
Science News
from research organizations

Biologists pioneer first method to decode gene expression

Genomic 'Rosetta Stone' taps powerful algorithm to identify expressible genes at near-perfect accuracy

Date:
August 12, 2019
Source:
University of California - San Diego
Summary:
Biologists have developed the first system for determining gene expression based on machine learning. Considered a type of genetic Rosetta Stone for biologists, the new method leverages algorithms trained on a set of known plant genes to determine a species-wide set of transcribed genes, or 'expressome,' then creates an atlas of expressible genes. The method carries implications across biology, from drug discovery to plant breeding to evolution.
Share:
FULL STORY

Given the recent remarkable advancements in genetics, it's easy to assume that 21st century scientists have at their disposal a clear, quick way to run a genomic sequence scan and find out which genes among thousands can be expressed and which cannot. Gene expression is the process by which information encoded within genes leads to key products, such as proteins.

Surprisingly, that hasn't been possible until now. Biologists at the University of California San Diego have developed the first system for determining gene expression based on machine learning. Given the lack of such a method, the new process is considered a type of genetic Rosetta Stone for biologists.

"This paper represents the first method to distinguish genes that can be expressed from those that cannot," said Steve Briggs, a Division of Biological Sciences professor and senior author of the paper. "This is the basis for all of biology. Whether it's drug discovery or plant breeding or evolution, this touches the basic studies of biology."

The method, developed by graduate student Ryan Sartor, Briggs and their colleagues, is described August 12, 2019 in the Proceedings of the National Academy of Sciences.

Biologists have previously classified gene expression through experimental observations and scientific literature references. But the genomics field lacked a formalized process for revealing this information, called the "expressible gene set," or EGS, which comprises all protein-coding genes with the potential to be expressed.

"In biology, there is no method to do this," said Briggs. "In the past we've just had empirical approaches to making catalogs -- we haven't had scientific criteria that classifies the genes based on their molecular features."

The new method leverages machine learning, the use of algorithms and other processes to analyze data, and is based on an example set of nearly 30,000 maize plant genes containing specific, detailed molecular features. An advanced algorithm was trained on the data and "learned" to classify gene expression at 99.4 percent accuracy.

The key to the advancement is bringing together chromatin biology, which contributes to regulating the DNA packaging within cells, with molecular features that are known to determine gene expression. Combining these with mathematical machine learning, the new method of determining the species-wide set of transcribed genes, or "expressome," then creates an atlas of expressible genes. The method may also be useful in understanding evolutionary mechanisms that silence certain genes.

Briggs is now applying the method to sorghum, an important grain for food and fodder, but says it can be useful beyond plant species. Ultimately, he says the new method is like a word decoder.

"The genome sequence is like a book," said Briggs. "The words are the genes. Until now, we couldn't tell which DNA sequences were real words and which merely resembled words. By removing non-words we now have a much more accurate reading of the book."

Coauthors of the paper include Jaclyn Noshay and Nathan Springer of the University of Minnesota. The National Science Foundation's Plant Genome Research Program supported the research.


Story Source:

Materials provided by University of California - San Diego. Original written by Mario Aguilera. Note: Content may be edited for style and length.


Journal Reference:

  1. Ryan C. Sartor et al. Identification of the expressome by machine learning on omics data. PNAS, 2019 DOI: 10.1073/pnas.1813645116

Cite This Page:

University of California - San Diego. "Biologists pioneer first method to decode gene expression." ScienceDaily. ScienceDaily, 12 August 2019. <www.sciencedaily.com/releases/2019/08/190812152158.htm>.
University of California - San Diego. (2019, August 12). Biologists pioneer first method to decode gene expression. ScienceDaily. Retrieved October 31, 2024 from www.sciencedaily.com/releases/2019/08/190812152158.htm
University of California - San Diego. "Biologists pioneer first method to decode gene expression." ScienceDaily. www.sciencedaily.com/releases/2019/08/190812152158.htm (accessed October 31, 2024).

Explore More

from ScienceDaily

RELATED STORIES